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Abstract 

QO ■ The generalization of the geometric mean of positive scalars to positive definite 

OA ■ matrices has attracted considerable attention since the seminal work of Ando. The 

paper generalizes this framework of matrix means by proposing the definition of a rank- 
preserving mean for two or an arbitrary number of positive semi-definite matrices of 
^^ . fixed rank. The proposed mean is shown to be geometric in that it satisfies all the 

Q^ ■ expected properties of a rank-preserving geometric mean. The work is motivated by 

operations on low-rank approximations of positive definite matrices in high-dimensional 
spaces. 



1 Introduction 



^►^ , Positive definite matrices have become fundam.ental computational objects in many areas 

—^ ■ of engineering and applied mathematics. They appear as covariance matrices in statistics, 

Q^ , as variables in convex and semidefinite programming, as unknowns of important matrix 

^^ ■ (in)equalities in systems and control theory, as kernels in machine learning, and as diffusion 

'^ , tensors in medical imaging, to cite a few. These applications have motivated the development 

f"~>. ■ of a difference and differential calculus over positive definite matrices. As a most basic 

^P , operation, this calculus requires the proper definition of a mean. In particular, much research 

has been devoted to the matrix generalization of the geometric mean yob of two positive 
numbers a and b (see for instance Chapter 4 in [1] for an expository and insightful treatment of 
the subject). The further extension of a geometric mean from two to an arbitrary number of 
positive definite matrices is an active current research area :2, 3,:.4i. It has been increasingly 
recognized that from a theoretical point of view [5] as well as in numerous applications 
5^ , [3 [71 [HI [HI m [ini [21 [HI [m [S] , matrix geometric means are to be preferred to their arithmetic 

counterparts for developing a calculus in the cone of positive definite matrices. 

The fundamental and axiomatic approach of Ando [2] (see also [3]) reserves the adjective 
"geometric" to a definition of mean that enjoys all the following properties: 

(PI) Consistency with scalars: HA and B commute, then M{A,B) ~ (Ai?)^/^. 

(P2) Joint homogeneity 

M{aA,l3B) = (a/3)i/2M(yl,B). 
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(P3) Permutation invariance M(A, B) = M{B, A). 

(P4) Monotonicity. li A < Aq (i.e. (^o — A) is a. positive matrix) and B < Bq, the means 
are comparable and verify M{A, B) < M{Aq, Bq). 

(P5) Continuity from above. If {A„} and {B^} are monotonic decreasing sequence (in the 
Lowner matrix ordering) converging to A, B then hm(yl„ o _B„) = M{A, B). 

(P6) Congruence invariance. For any G G Gl(n) we have M{GAG'^ , GBG^) = GM{A, B)G^ . 

(P7) Self-duahty MiA,B)-^ = MiA-\,B-'^). 

The present paper seeks to extend geometric means defined on the open cone Pn to the the 
set of positive semi-definite matrices of fixed rank p, denoted by S^(p,n), which hes on the 
boundary of Pn. Our motivation is primarily computational: with the growing use of low-rank 
approximations of matrices as a way to retain tractability in large-scale applications, there 
is a need to extend the calculus of positive definite matrices to their low-rank counterparts. 
The classical approach in the literature is to extend the definition of a mean from the interior 
of the cone to the boundary of the cone by a continuity argument. As a consequence, this 
topic has not received much attention. This approach has however serious limitations from 
a computational viewpoint because it is not rank-preserving. For instance Ando's geometric 
mean of two semi-definite matrices having rank p < n/2 is almost surely null with this 
definition. 

We depart from this approach by viewing a rank p positive semi-definite matrix as the 
projection of a positive definite matrix in a p-dimensional subspace. Our proposed mean 
lies in the mean subspace, a well-defined and classical concept. The proposed mean is rank- 
preserving, and it possesses all the properties listed above, modulo a few adaptations imposed 
by a rank-preserving concept: (PI) is impossible unless the rank of AB is equal to the rank 
of A and B. Indeed, as the mean must preserve the rank, it can not be equal to {ABy^^ if 
the latter condition is not satisfied. Also (P6) will be shown to be impossible to retain when 
the rank is preserved. Indeed it is this property that causes Ando's geometric mean of two 
matrices of sufficiently small rank to be almost surely null. In (P7) inversion must obviously 
be replaced with pseudo-inversion. Letting Ao B denote the desired mean in S''"(p,n), we 
suggest to replace those three properties with: 



(Pf) Consistency with scalars: if A, B commute and AB has rank p, Ao B ^ i^B) 
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(P6') Invariance to scalings and rotations. For {fJ.,P) G M.*^ x 0(n) we have {fiP^AuP) o 
(fiP^B^iP) = nP^{A o B)nP. 

(P7') Self-duality {A o _B)t = A^ o B\ where f is the pseudo-inversion. 

In the recent work [14] , we used a Riemannian framework to introduce a geometric mean of 
two matrices in S+ (p, n) that was shown to satisfy those properties. The present paper further 
develops the concept by providing an inutitive characterization and a closed formula for its 
calculation. Furthermore, we show that the concept extends to the definition of a geometric 
mean for an arbitrary number of matrices, thereby providing the low-rank counterpart of 
recent work on positive definite matrices [21 |3l |4] . 

The structure of the paper is as follows. Section 2 and 3 are mainly expository. In 
Section 2, we review the theory of Ando in the cone of positive definite matrices and we 
illustrate the shortcomings of the continuity argument for a rank-preserving mean to be 
defined on the boundary of the cone. In Section 3, we review the Riemannian interpretation 
of Ando's mean of two matrices A and B as the midpoint of the geodesic joining A and B 
for the affine invariant metric of the cone and introduce the Riemannian concept of Karcher 



mean. Section 4 develops the proposed geometric mean for an arbitrary number of matrices 
in the set S+(p, n). The geometric properties of this mean are characterized in Section 5. 
Finally, Section 6 illustrates the relevance of a rank-preserving mean in the context of filtering. 
Preliminary results can be found in |15) . 

1.1 Notation 

• Pn is the set of symmetric positive definite n x n matrices. 

• S^(p, n) is the set of symmetric positive semidefinite n x n matrices of rank p < n. 

• Sym(n) is the vector space of symmetric n x n matrices. 

• St(p, n) = 0(n)/0(n — p) is the Stiefel manifold; i.e., the set oi n x p matrices with 
orthonormal columns: U'^U = Ip. 

• Gr(p, n) is the Grassman manifold, that is, the set of p-dimensional subspaces of R". 
It can be represented by the equivalence classes St(p,n)/0(p). 

• span(^) is the subspace of R" spanned by the columns of A e R"^". 

• TxM is the tangent space to the manifold Al at X. 

2 Analysis pass: Ando's approach 

2.1 Mean of two matrices Ai and A2 

For positive scalars, the homogeneity property (P2) implies M{ai,a2) = aiAf (1,02/01) — 
o-if {0-2/0,1) with / a monotone increasing continuous function. In a non-commutative matrix 
setting, one can write 

M(Ai, A2) = Al/'f{A;'/'A2A;'/')A\^^ . (1) 

with / a matrix monotone increasing function. Several geometric means can be defined in 
this way (see, e.g., [ID]). The well-established geometric mean of two full-rank matrices 
popularized by Ando [TBI [HI [2] corresponds to the case f{X) — X^^^, generalizing the scalar 
geometric mean. It writes 

A,#A2 = AY'iA-'/'A2A-'/'y/'Al/\ (2) 

It satisfies all the propositions (P1-P7) listed above. There are many equivalent definitions 
of the Ando geometric mean in the literature. 

A geometric mean satisfying ([T]) is defined for positive definite matrices, that is, for 
elements in the open cone of positive definite matrices. Rank-deficient matrices lie on the 
closure of the cone. As a consequence, the natural idea to extend a geometric mean on the 
boundary is to use a continuity argument. The resulting mean satisfies all the properties 
above (except for (P7) that must be formulated using pseudo- inversion) , but it is not rank- 
preserving. Indeed, let Ai — diag(4, e^) and A2 — diag(e^,l) where the term e ^ 1. These 
two matrices belong to P2, and their geometric mean is Ai^A2 =diag(2e, e). In the limit 
(rank-deficient) situation e — > 0, the mean becomes the null matrix diag(0, 0). The following 
proposition shows that this example is not pathological. 

Proposition 1. // (P6) is satisfied, the geometric mean of two m,atrices o/S^(p, n) is alm,ost 
surely null if p < n/2. 



Proof. In [2], it is proved (Theorem 3.3) that (P6) iraphes that the range of the geometric 
mean of Ai and A2 is the intersection of the subspaces span(Ai) and span(A2) (this can be 
proved letting a sequence of matrices of Gl(n) converge to the orthoprojector on Ker Ai). 
Since the intersection of two random subspaces of dimension p is almost surely empty as long 
as n — p > p, the range of span(yli) n span(yl2) is almost surely the null space, which proves 
the claim. D 

A rank-preserving mean thus requires a different approach. We seek to retain most of 
the properties (P1-P7), but we will see that three of them must be relaxed to define a rank- 
preserving mean. The major relaxation consists in choosing a smaller invariance group in 
(P6), replacing the general linear group Gl(n) with the smaller but meaningful group of 
scaling and rotations W^ x 0{n). 

2.2 Mean of an arbitrary number of matrices Ai,- ■ ■ , An 

A geometric mean of an arbitrary number of matrices, that extends the geometric mean of 
two matrices ([2]) is not very well-established. Indeed the definitions based on equations ([Ij 
for instance, are not easily generalized. Several possible definitions have appeared in the 
literature and we shall not review all of them. In any case, it seems natural to reserve the 
adjective geometric, to a mean that satisfies the following properties (PP1-PP7). They are a 
natural extension of (P1-P7) to the case of three matrices, and the extension to an arbitrary 
number of matrices is straightforward. (PPl) if A,B,C commute M{A,B,C) = (ABC)^/^. 
(PP2) M{aA,l3B,jC) = {a/3j)^/^M{A,B,C). (PP3) M{A,B,C) = M{Tr{A,B,C)) for any 
permutation n. (PP4) The map (A, B, C) H> M{A, B, C) is monotone. (PP5) If {An}, {B„}, 
{C„} are monotonic decreasing sequences converging to A,B,C then limM(A„, i3„, C„) = 
M{A, B, C). (PP6) For any G e Gl(n) we have M{GAG^, GBG^, GCG^) = GM{A, B, C)G^ . 
(PP7) M{A, B, C)-i = M{A-\B-\G-^). 

This axiomatic approach has been proposed in 2 , and the authors have defined a mean 
we shall denote alm(Ai, • • • , A„), adopting the notation of [1]. This mean is defined as the 
common limit of a converging sequence of matrices, and it was proved to preserve properties 
(P1-P7) as well as their extension (PP1-PP7) to three or more matrices. 

3 Geometric pass: geometric mean as a Riemannian 
mean 

Ando's mean ^ has the alternative Riemannian interpretation of the midpoint of a geodesic 
connecting the matrices A and B. This connection appears for instance in |17j . Because this 
Riemannian interpretation is at the root of our proposed rank-preserving mean, it is reviewed 
in this section. 

3.1 Riemannian mean and Karcher mean on a Riemannian manifold 

The arithmetic mean of N positive numbers in S*^ is defined as M{xi, • • • , xn) — — X^i ^i- 
It has the variational property of being the unique minimizer of the sum of squared distances 
M{xi, • • • , xn) = argmiUj. ^^ d{x, xi)^ where d is the Euclidean distance in M. In the same 
way, the geometric mean of n positive scalars minimizes the same sum if one works with the 
hyperbolic distance d{x,y) =\ log a; — logy |. 

This variational approach allows to define candidate means of an arbitrary number of 
matrices on any connected Riemannian manifold A4. Such manifolds carry the structure 
of a metric space whose distance function is the arclength of a minimizing path between 



two points. Indeed the mean of xi,- ■ -xm on A^, can be defined as the niinimizer of the 
sum of squares Yli d{x,Xi)'^ where d is the geodesic distance on M. Such a mean is known 
as the Riemannian barycenter, of Karcher or Frechet mean. When only two points are 
involved, the Karcher mean is the midpoint of the minimizing geodesic connecting those two 
points and it is usually called the Riemannian mean. The main advantage of the Karcher 
mean is to readily extend any mean that can be defined as a geodesic midpoint, to an 
arbitrary number of points. Unfortunately the mean can rarely be given in closed form, and 
is typically computed by an optimization algorithm on the manifold (see e.g. |18) for more 
information on this branch of optimization). In |19) it has been shown that the Karcher mean 
is uniquely defined on manifolds with non-positive curvatures. On arbitrary manifolds with 
upper bounded sectional curvature, the Karcher mean exists and is unique in geodesic balls 
with sufficiently small radius pO] . 

3.2 Ando's mean as a Riemannian mean in the cone ?„ 

Any positive definite matrix admits the factorization X = YY^ , Y e Gl{n), and the fac- 
torization is invariant by rotation Y i— > YO. As a consequence, the cone of positive def- 
inite matrices has a homogeneous representation Gl{n)/0{p). The space is reductive and 
thus admits a Gl(n)-invariant metric called the natural metric of the cone of positive def- 
inite matrices [5]. If Xi,X2 are two tangent vectors at A e ?„, the metric is given by 
g^{Di,D2) = Tr (£'iA~^I?2j4~^). With this definition, a geodesic (path of shortest length) 
at arbitrary yl e P^ is [13]: -fA{tX) = A^/^ e^p{tA-~^/^XA-^/^)A^^^, t > 0, and the 
corresponding geodesic distance is 

dp^iA,B) = d{A-'^^BA-^^^,I) = ||log(A-i/2BA-i/2)||f , 

= /^V(A,), 

where A^ are the generalized eigenvalues of the pencil A—XB, i.e., the roots oi det{AB^^ — XI) . 
The distance is invariant with respect to action by congruence of Gl(n) and matrix inversion. 
The geodesic characterization provides a closed-form expression of the Riemannian mean 
of two matrices A,B^Pn- The geodesic A{t) linking A and B is 

A{t) = exp^"(tX) = Ai/2exp(ilog(A-i/2BA-i/2))Ai/2, 

where A-^^^XA-^^^ = log{A-'^^^BA-^^^) e Sym(n). The midpoint is obtained for t = 1/2: 
M{A,B) ^ Ai/2(yl-i/2BA-i/2)i/2^i/2 and it corresponds to the definition ^. 

3.3 Mean of positive definite matrices and Karcher mean in the 
cone Pn 

For Ai, ..Af^f e Pn viewed as a Riemannian manifold endowed with the natural metric, the 
Karcher mean is defined as a minimizer of X H> J^i d{X,Ai)^, i.e. a least squares solution 
that we shall denote ls(Ai, • • • , A„) as in [4]. The manifold Pn endowed with the natural 
metric is complete, simply connected, and has everywhere a negative sectional curvature. As 
a consequence, the Karcher mean is uniquely defined. It has been proposed by [8] as a natural 
candidate for generalizing the Ando mean to TV matrices, and studied by [4]. It can mainly 
be calculated via a simple Newton method on Pn, or by a stochastic gradient algorithm f2I]. 
However, finding a closed-form expression of the Karcher mean of three or more matrices of 
Pn remains an open question. Several recent papers adress the issue of approximating the 
Karcher mean via simple algorithms [21 13] . 



3.4 Mean of projectors and Karcher mean in the Grassman manifold 

The Riemmanian approach to the definition of means provides a natural definition for the 
mean of p-dimensional projectors in M", which forms a subset of S"'"(p,n): 

{P e R"^"/ P^ = P, P^ = P, Tr (P) = p}, (3) 

This set is in bijection with the Grassman manifold of p-dimensional subspaces Gr(p,n) (e.g. 
[18]). This manifold can be endowed with a natural Riemannian structure. The squared 
distance between two subspaces is merely the sum of the squares of the principal angles 
between those two p-planes. The Riemannian mean of two subspaces is uniquely defined as 
soon as all the principal angles between those subspaces are stricly smaller than 7r/2. In 
other words, the injectivity radius at any point, i.e. roughly speaking the distance at which 
the geodesies cease to be minimizing, is equal to 7r/2 on this manifold. The Karcher mean 
of N subspaces Si, ■ ■ ■ Sn of Gr(p, n) is defined as the least squares solution that minimizes 

X H> J2i '^Gr(p,n)(-'^i'S'i)^- This latter quantity is equal to J2i=iJ2^=i^ij where % is the 
j-th principal angle between X and Si. For N > 2, there is no closed- form solution for 
the mean subspace X. For this reason, the Riemannian mean is often approximated by the 
chordal mean ^5\, see Section [6l As it is well-known the sectional curvature of the Grassman 
manifold does not exceed 2, and the injectivity radius is n/2, we have guarantees that the 
Karcher mean exists and is uniquely defined in a geodesic ball of radius less than 7r/(4v2), 
see [20]. 

The Karcher mean of projectors in Gr(p, n) is a natural rank-preserving rotation-invariant 
mean that is well-defined on a subset of the boundary of the cone. We will use this mean 
subspace as a basis for the mean of N matrices of S+(p, n). 

4 A rank-preserving mean of an arbitrary number of 
matrices of S+(p, n) 

The extension of the mean from projectors to arbitrary matrices of S^(p, n) is based on the 
decomposition 

A ^ UR^U^, 

of any matrix A G S^(p,n), with U an orthonormal matrix in St(p,n) and R^ a positive 
definite matrix in Pp. This matrix decomposition emphasizes the geometric interpretation 
of elements of S^(p, n) as flat p-dimensional ellipsoids in R". The flat ellipsoid belongs to a 
p-dimensional subspace spanned by the columns of U, which form an orthonormal basis of 
the subspace, whereas the px p positive definite matrix R^ defines the shape of the ellipsoid 
in the low rank cone Pp. The matrix decomposition Ai = UiR^U'f is defined up to an 
orthogonal transformation 

U ^ UO, p2 ^ o^R'^0 , 

with O £ 0(p) since 

Ai = UiRJUf = UiOiO^R'^0)O^Ui. 

The orthogonal transformation does not affect the Grassman Riemannian mean but does 
affect, in general, the mean of the low-rank factors since Af(Pf,P|) 7^ M{R\,O^R^O) 
where M is the Ando mean. Principal difficulties for defining a proper geometric mean stem 
from this ambiguity. 



Stiefel manifold = total space 



'Uj^O(p) / U20(p) 
Grassman manifold = quotient space 

Figure 1: (Yi, I2) are two bases of the subspaces spanned by the columns of Ui and U2 (the 
fibers) that minimize the distance in St(p,n). The dashed hne represents the shortest path 
between those two fibers, thus its horizontal lift (i.e. its projection) in Gr(p, n) viewed as a 
quotient manifold, is a geodesic. 



4.1 Mean of two matrices Ai and A2 

Let Ai = UiRfUi and A2 = U2R2U2 be elements of S+(p,n). The representatives of the 
two matrices {Ui, Rf),i — 1,2, are defined up to an orthogonal transformation 

C/, ^ U.,0, Rf ^ O'^RfO. 

All the bases C/iO(p) correspond to the same p-dimensional subspace UiU^ (Figure [1]). Note 
that, this representation of a p-dimensional subspace as the set of bases UiO{p) is at the core 
of the definition of the Grassman manifold Gr(p, n) as a quotient manifold [22] 

Gr(p,n)«St(p,n)/0(p). 

The equivalence classes 1/^0 (p) are called the "fibers". 

We will systematically assume that the principal angles between span(L/i) and span(C/2) 
are less than tt/2, which is almost surely true if the subspaces span([/i) and span(C/2) are 
picked randomly. In the case of two matrices, this is sufficient to ensure their Karcher mean 
in Gr(p,n) is unique. To remove the ambiguity in the definition of a mean of two PSD 
matrices, we propose to pick two particular representatives Yi — UiQi and Y2 = U2Q2 in 
the fibers C/iO(p) and U20{p) by imposing that their distance in St(p,n) does not exceed 
the Grassman distance between the fibers they generate: 

dst{p,n){Yi,Y2) = (iGr(pai)(span(C/i),span(C/2)) , (4) 

Because the projection from St(p, n) to Gr(p,n) is a Riemannian submersion |18j . and Rieman- 
nian submersions shorten the distances [26j , this condition admits the equivalent formulation 

(Qi, Q2) = argminjo^ o^)go(p)'^st(p,n)(t^iOi, U2O2) ■ (5) 

which is illustrated by the picture of Figure [1] a geodesic in the Grasmman manifold admits 
the representation of a horizontal geodesic in St(p, n), that is, in the present case, a geodesic 
whose tangent vector points everywhere to a direction normal to the fiber. 
The following proposition solves equation ([5]). 



Proposition 2. The compact SVD of U1U2 writes 

[/ft/2 = Oi(cosI])0^. (6) 

where T, is a diagonal matrix whose entries are the principal angles between the p dimensional 
subspaces 12Sf . If the pair {Oi,02) is defined via ^, it is a solution of ([S]). 

Proof. We use a well-known result in the Grassman manifold: the shortest path between two 
fibers in St(p,n) concides with the geodesic path linking these two fibers in Gr(p,n), as the 
projection on the Grassman manifold is a Riemannian submersion, and thus shortens the 
distances (see [23J[5^ for results on quotient manifolds). If two bases Yi and I2 of the fibers 
UiO{p) and U20{p) are the endpoints of a geodesic in the Grassman manifold, they must 
minimize ([S]). It is thus sufficient to prove that Yi = UiOi and Y2 = U2O2 where Oi, O2 are 
defined via ^ are the endpoints of a Grassman geodesic. 

A geodesic in the Grassman manifold with Yi as starting point and A as tangent vector 
admits the general form [35] 

'y{t) = YiV cos QtV^ + U sin etV^ , (7) 

where UQV^ = A is the compact SVD of A. We thus propose to define the curve 

Y{t) = Yi cos J:t + X sin St . 

To define X we first assume all principal angles, i.e., all diagonal entries of S are strictly 
positive. Then we let X = (Y2 — Yi cosE)(sinI])^^. The curve Y(i) is a geodesic, as it is 
of the form ([7]) with A — XE which is a tangent vector as YjJ^A — (since Yj^Y2 = cosS), 
and XT, is a compact SVD as X'^X = /. This is because X'^X = (Y^ - cos'EY-^){Y2 - 
Yi cos E)(sin I])-^ ^ (/_ (cos I])2)(sin E)-^ = / where we used the fact that Y.^Yi = Y^Y2 = 
cosE. Yi and Y2 are its endpoints indeed as Y2 = Y(l). If there are null principal angles, 
it is clear that Y{t) is a geodesic, where X = (Y2 — Yi cosE)(sinE)^ along the directions 
corresponding to non-zero principal angles, and where X can be completed arbitrarily with 
orthonormal vectors along the directions corresponding to null principal angles. Indeed, along 
those directions Yi and Y2, and thus Y{t) coincide, and the value of X does not play any 
role in the definition of Y{t). 

D 

The following result allows to understand why the choice of the specific bases Yi , Y2 is 
relevant for defining a geometric mean, as explained in the end of this subsection. It proves 
the rotation of minimal energy (i.e. the closest to identity) mapping the subspace span(j4i) 
to span(y42) maps Yi to Y2. 

Proposition 3. Let Yi = UiQi and Y2 = U2Q2 with {Qi,Q2) a solution of ^. Then the 
rotation R G SO(n) that maps the basis Yi to the basis Y2 = RYi is a rotation of minimal 
energy, that is, it minimizes dso(n)(-R;-^) among all rotation matrices that map Yi to the 
subspace span(Y2). 

Proof. One can assume without loss of generality that Yi — [ei,- ■ ■ , e^] where (ei, • • • , e„) is 
the canonical basis of ffi.". Moreover, the search space can then be restricted to the rotations 
whose r first columns are of the form Y2O, whereas the n — r remaining columns coincide with 
the identity matrix, as the rotation seeked must minimize the distance to identity. Any such 
rotation mapping Yi to Y2O has its first columns equal to Y2O and coincides with identity on 
the last n — r columns. Thus we have for any such rotation dgf_/^^^\{R, I) — dst(p.n)(^20,/). 
But as SO(n) = St(n, n) and the metrics also coincide, we have dso(n)(-Ri I) = '^st(n,n)(-Ri -^)- 
Thus the problem boils down to ([5]) and is solved taking O = I. D 



Having identified representatives as endpoints of a geodesic in Gr(p,n), their Riemannian 
mean in the Stiefel manifold (and in the Grassman manifold) is now easily written as the 
midpoint of the geodesic: 

Y,oY2^Y{t=^)^Y,cos^+Xsm^. (8) 

Note that a weighted mean can be also computed using the geodesic parameterization: 

Yi o 12 = Y{a) = Yi cos(aE) + X sin(al]) , (9) 

where the weight given to Yi is 1 — a and the weight given to Y2 is a. 

Once Yi and Y2 have been computed, Ri and R2 are given by the corresponding repre- 
sentatives 

Rl = Y^AiYu Rl = YIA2Y2. (10) 

The proposed mean of two matrices Ai , A2 is then given by 

Ai o A2 = W{Rl#Rl)W'^ 

where W is the Riemannian mean of Yi and Y2 and R\ij^R\ is the Ando mean ^ of i?J and 
Rl in Pp. 

Proposition [3] provides a simple geometrical intuition underlying the definition of the 
mean: the mean of two flat ellipsoids Ai and A2 is defined in the mean subspace as the 
geometric mean of two full p-dimensional ellipsoids R\ and R\. There are several ways to 
rotate the ellipsoid Ai into the subspace spanned by A2 . Different rotations will yield different 
respective positions of the two final ellipsoids. The choice is made univoque and sensible by 
selecting the minimal rotation. The rotated ellipsoid then merely writes Y2i?^Y2^. Thus R\ 
and i?2 define the shape of the ellipsoids expressed in the same basis Y2 . Figure [2] illustrates 
the proposed mean of two flat ellipsoids of 5^(2, 3). 

4.2 Generalization to A^ matrices Ai, ■ ■ ■ , A^ E S+(p,n) 

The construction presented in the previous section for two matrices is now extended to an 
arbitrary number of matrices. The main idea is to define a mean p-dimensional subspace and 
to bring all flat ellipsoids Ai,- ■ ■ , A^ to this mean subspace by a minimal rotation. In the 
common subspace, the problem boils down to compute the geometrical mean of N matrices 
in Pp. The construction is achieved through the following three steps: 

1. Let Ai — UiR^Uf ioT 1 <i < N . Suppose that the subspaces spanned by the columns 
of the AiS are enclosed in a geodesic ball of radius less than 7r/(4\/2) in Gr(p,n). Then 
define the projector W G St(p,n) as a basis of their unique Karcher mean. 

2. For each i, compute two bases Yi and Wi of (respectively) span(C/i) and span(VF) 
such that dst{p,n){Xi^^i) = c?Gr(p,n)(span(C/i),span(Vl^)) i.e. solve problem ([5]). This 
is illustrated on Figure [31 Let Sf — Y^AiYi. The ellipsoid Ai rotated to the mean 
subspace writes WiSfWf . 

3. Let AI denote the geometric mean aim or Is on Pp. For each 1 < i < TV let T^^ = 
WQWiSfWfWo e Pp where Wq € St(p,n) is a fixed basis of the mean subspace. The 
geometric mean of the matrices Ai,- ■ ■ , A]y is defined the following way: 

Ai o ... oAn = Wo[M{Ti, ■■■ , T^)]W;[ . (11) 




Figure 2: Proposed mean in S'+(2,3). The proposed mean is such that both eUipsoids are 
brought to the mean subspace via a rotation of minimal energy, and then averaged. The 
resulting mean is a flat ellipsoid that lives in the mean subspace. 



Mean subspace 




UiO(p) 



WO{p) 



U,0(p) 



U30(p) 



Figure 3: The bases Yi,- ■ ■ , Yn of the fibers and the bases Wi,- ■ ■ , Wn of the mean subspace 
fiber are such that {Yi,Wi) are the endpoints of a geodesic in the Grassman manifold. 
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The approach can summarized as follows: in 1. a mean subspace is computed, in 2. the 
ellipsoids are rotated to this subspace, in 3. they are all expressed in a common basis Wo so 
that their geometric mean can be computed in the small dimensional cone. An algorithmic 
implementation is proposed in Section 6.1. 

5 Properties of the proposed mean of A^ matrices of 

S+(p,n) 

Throughout this section 

• M will systematically denote one of the two geometric means aim or Is on Pp. 

• it will be systematically assumed the subspaces spanned by the columns of Ai, ...Apf S 
S+(p,n) belong to a geodesic baU of radius less than n/{4:y/2) in Gr(p,n), so that the 
Karcher mean of these subspaces is well-defined and unique. 



• 



with a slight abuse of notation, any projector UU'^ where U eSt(p,n) will systematically 
be considered as an element of Gr(p,n), i.e. as a subspace. 



5.1 Analytic properties 

Proposition 4. On the set of rank p projectors, the mean (jlip coincides with the Grassman 
Riemannian mean. On the other hand, when the matrices in S^(p, n) all have the same range, 
pip coincides with the geometric mean induced by the geometric mean M on the common 
range subspace of dimension p. More generally (jlip coincides with M on the intersection of 
the ranges. 

Proof. The first two properties are obvious. The last one is linked to the special choice of a 
minimal energy rotation. Indeed, on the intersection of the ranges, the rotation of minimal 
energy is the identity. D 

The next proposition proves that the proposed mean inherits the several properties 
of a geometric mean in the cone. For the reasons explained in the introduction of the 
paper. Properties (PPl) and (PP6-PP7), defined for the mean of three or more matri- 
ces in Subsection 12. 2[ must be adapted as follows: (PPl') if A,B,C commute and ABC 
has rank p, then {A o B o C) = {A,B,C)^/^. (PP6') For (^, P) G R^ x 0(n) we have 
[iiP'^ AiiP)o{^ipT BiiP)o{^ipT C iiP) = n^P'^{AoBoC)P. (PP7') {AoBoCy = A^B^oC^. 

Proposition 5. The mean ()lip with M = aim is well-defined, and deserves the appellation 
"geometric" as it satisfies the properties (PPl'), (PP2-PP5), and (PP6'-PP7'). The result 
also holds with M = Is, except that (PP4) can only be conjectured in this case. 

Proof. (PPl'): if Ai o • • • o An has rank p, and Ai,- ■ ■ , An commute, then all A^'s span the 
same ji-dimensional subspace. As a result, they all have the same range. On this common 
range, the mean has been proved to coincide with M, see Proposition |4l It thus inherits the 
(PPl) property. 

M satisfies (PP2) and so does ([TT]) . To prove permutation invariance (PP3) it suffices 
to note that both Grassman mean and M are unvariant by permutation. To prove (PP4), 
suppose Ai < A" for each i. Then Ai's and A'^ have the same range and they all admit a 
factorization of the type WRlfW^ . (PP4) is then a mere consequence of the monotonicity 
of M. This property was proved for M=alm in [2] and it was conjectured for M=ls in [1]. 
Using the same arguments, one can prove continuity from above of the mean is a consequence 
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of continuity of M . (PP7') can be easily proved noting that for each i the pseudo-inverse 
writes A] — UiRJ^Uf . Thus the calculation of the mean of the pseudo-inverse yields the 
inverse T[^ 's of the T^'s and (PP7') is the consequence of self-duality of M. 

(PP6'): As for all /i > and i we have ^Ai = Yi{ixIi^)Y^ invariance with respect to 
scaling is a mere consequence of the invariance of M . Let O € 0(n). The mean subspace 
in Grassman of the rotated ranges {OAiO^ys is the rotated mean subspace of the range of 
the AiS. Proposition [5] says that Y^Wi = cosE But for every i we have {Y^'^0'^){OWi) = 
Y^Wi = cosE. Thus the matrices are transformed according to Wi i— >■ OWi and the Tls are 
unchanged. The mean of the rotated matrices is thus OWq G(Ti, ...,Tm) WqO^. D 

5.2 The proposed geometric mean as a Karcher mean in a special 
case 

In the recent work [14 , the authors proposed an extension of the affine- invariant metric of 
the cone to S'^(p, n). In this subsection, we explore the links between the Karcher mean in 
the sense of this metric and the proposed mean (fTTj). We underline the fact that the proposed 
mean is not the Karcher mean in the cone. Yet, we prove that both means coincide in the 
meaningful case where all the matrices are rank p projectors. 

The metric introduced in [T3] is as follows. If {U,R'^) G St(p,n) x Pp represents A £ 
S+(p, n), the tangent vectors of TAS"'"(p,n) are represented by the infinitesimal variation 
(A, D), where 

D = RDqR, 



such that U± G St(n, n - p) , U'^U± = 0, and Do G Sym(p) = T/Pp. Vectors of the form (fT2| 
constitute a subset of tangent vectors to the total space St(p,n) x Pp. This subset is called 
the horizontal space, and is defined such that each tangent of the horizontal space defines a 
unique tangent vector in the quotient S"'"(p,n) (i.e. the horizontal space is transverse to the 
fibers). The chosen metric of S"'"(p,n) needs only be defined on the horizontal space, and is 
merely the weighted sum of the infinitesimal distance in Gr(p, n) and in Ppi 

9kiu,R-)ii^i.Di), (A2, ^2)) = Tr (Af A2) + A: Tr {R-' D^R-^ D2R-^) , fc > 0, (13) 

generalizing g^" in a natural way. The space S+(p,n) = (St(p,n) x Pp)/0(p) endowed 
with the metric p3p is a Riemannian manifold, and the metric is invariant to orthogonal 
transformations, scalings, and pseudo-inversion. 

Proposition 6. Consider N rank p projectors UiUi , ■ ■ ■ UnUJj G S^(p, n). Then the mean 
(jlip is the Karcher mean of UiU^ , ■ ■ ■ UmUJ^ in the sense of metric ([1^ 



The proof of this propostion is based on two lemmas. Indeed, this result stems from 
the fact that pT|) is of the form WW^ , where this latter projector is the Karcher mean 
of the A'' projectors in the sense of the Gr(p,n) natural metric. This means WW^ is the 
minimizer of the cost G{VV'^) = J2i ^Gr(p n)(^*^i^' ^^'^)- But the Karcher mean in S+(p, n) 
is defined as the minimizer of the cost F{X) — X]j'^s+( n)i^i^T t-^)- ^^^ first following 
lemma, proves that F{X) > G'(span(X)) for all X G S+(p, n). Thus for all X we have 
F{X) > G{WW^). But the second following lemma proves that G{WW'^) = F{WW'^). As 
a result, WW^ G S^(p,n) minimizes F. 

Lemma 1. The distance between arbitrary Ai^ A2 in S"^(p, n) is lower bounded by the distance 
between their ranges in the Grassman manifold: dg+(p „)(Ai, A2) > (iGr(p,n)(span(yli), span(A2)) 
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Proof. A horizontal curve (f/(i), i?(i)) has length /^ J gk[int),R(t)){U{t),R{t))dt. For two 

matrices Ai,A2 € S"'"(p, n), consider the horizontal lift {U{t),R{t)) of the geodesic linking 
Ai and A2 in S+(p,n) in the sense of metric (fT3| . As the horizontal vector {U(t),R{t)) 
has a shorther norm in the tangent space than the horizontal vector ([/(i),0), we have 
ds+{p^n){Ai,A2) > dst(p,n)(t/(0), [/(!))■ Besides, U{t) defines a curve in St(p,n) linking 
span(Ai) and span(A2). As the projection from the Stiefel manifold to the Grassman man- 
ifold viewed as a quotient space Gr(p,n) ~ St(p,n)/0(p) is a Riemannian submersion, it 
shortens the distances [26]. i.e. dGi.(p,n)(span(Ai), span(yl2)) < dst{p,n){U{0),U{l)). This 
proves the result. D 

Lemma 2. In the particular case where Ai,A2 in S^(p,n)are two projectors, the geodesic 
joining them in S^(p,n) coincides with the geodesic joining their ranges in Gr(p,n). It 
implies ds+{p,n){Ai, A2) = dGr(p,n)(span(Ai), span(A2)). 

Proof. One can find a horizontal curve in S+(p, n) whose length is dGr(p,n)(^ij A2), by choos- 
ing representatives in St(p, n) as in Proposition [21 It is thus a geodesic in Grassman, but 
also in S+(p,n) because of Lemma [TJ D 

Beyond the particular case of projectors, it must be emphasized that the mean (|11[) is not 
the Karcher mean in the sense of metric (TTSl) . This is because a horizontal curve {U{t),R{t)) 
that is made of a geodesic U{t) in Grasmman and of a geodesic R{t) in the cone does not 
define a geodesic in St(p,n). For instance, it is possible to construct a geodesic joining 
matrices having the same range, and such that the mid-point does not have the same range 
(see [13], Proposition 1). This counter-example shows Proposition |H although very natural, 
is not satisfied by the Karcher mean, as the mean of matrices having the same range does 
not boil down to their geometric mean within this range. Even if the metric seems natural, 
and has been successfully used in several applications (e.g., [27j[28]), the resulting Karcher 
mean lacks elementary expectable properties that the mean (jlip possesses. 

6 Application to filtering 

6.1 Algorithmic implementation and computational cost 

Here we recap the basic steps for an implementation of the mean. The calculation of the 
mean has a numerical complexity of order 0{np^). This cost is linear with respect to n, a 
very desirable feature in large-scale applications where n 3> p. 

1 . For 1 < i < A^ let C/i be any orthonormal basis of the span of Ai . 

2. Let W be an orthonormal basis of the subspace that is the Karcher mean in the Grass- 
man manifold between the associated subspaces. Instead of minimizing J2i=i S?=i ^?j' 

an interesting alternative is to minimize X]i=i X]fci(^i'^^u)^' which corresponds to 
approximate the angular distance by a chordal distance. Both definitions are asymp- 
totically equivalent for small principal angles. In this case, the solution was shown in 
[25] to be the p-dimensional dominant subspace of the centroid J2i=i ^iU^ i which can 
easily be found by truncated SVD. 

3. For 1 < i < A^ 

• The SVD of U^W yields two orthogonal matrices 0„0f such that OfU^WOj^ 
is a diagonal matrix. 
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• Let Yi = UiOi and W, = WOf . Let S} = Y^ A^Yi. Let T^ = W^W.SlWfW . 

4. Compute the geometric mean in the low-rank cone M{Ti, ■ ■ ■ ,r|) using methods in 
the hterature 2, 21]. 

5. The geometric mean is: W M{T^, ■ ■ ■ ,T|) W^ . 

6.2 Geometric means and filtering applications 

Filtering on S^{n, n) with the metric P^ (which is the GL(n)-invariant metric of the cone 
Pn) was studied extensively for diffusion tensor images (DTI) filtering in [BJEllin]. It was 
also applied to signal processing in [13 , and also seems to be promising in radar processing 
[29] . One of the main benefits of this metric is its invariance with respect to scalings which 
makes it very robust to outliers, i.e. large noise, as the effect of a large eigenvalue is mitigated 
by the geometric mean. This very property, which is desirable for means in the interior of 
the cone, yields a great lack of robustness to (even small) noises as soon as some matrices 
are rank-deficient. The mean (jlip inherits the invariance to scalings property, which yields 
robustness to outliers, without being subject to the same problems in case of rank deficiency, 
as illustrated by the following proposition. 

Proposition 7. Let A e S^(p,n), and R^ he a rotation of magnitude e. If spa,n{R^A) D 
span(^) — $, which can be the case with arbitrary small e > as soon as p < n/2, the 
Ando mean of A and R^A is the null matrix according to Propostion 1. On the other hand, 
A o R^A -^ A when e ^ 0. 

This proposition shows that the Ando mean of a stream of noisy measurements R^A of 
the matrix A, is generally the null matrix, even with arbitrarily small noises, whereas it 
should be close to A. On the other hand, it is indeed the case when the rank-preserving 
metric proposed in this paper is used. This appears to be a fundamental feature in filtering 
applications. 

6.3 A toy example: filtering a constant rank p positive semi-definite 
matrix 

In continuous-time, a first-order filter meant to filter a constant noisy signal y{t) writes 

d 

T—x = -x + y. 

In discrete-time, using a semi-implicit numerical scheme, it becomes 

y,dt + Tx, 

Xi+l = --TT- > 14) 

at + T 

which is a weighted mean between the measured signal yt and the filtered signal Xi. Thus a 
weighted version of the mean proposed in this paper can be used to do filtering. Indeed if 
the usual matrix arithmetic mean is used, the rank is not preserved. 

For the sake of illustration, we apply this idea to a weighted version of the mean proposed 
in this paper applied to the filtering of a constant matrix zz'^ of S^(l, 2), where z S R^ is a 
constant vector. Let Y{t) ^ {z + i'{t)){z + v{t))'^ & S+(l, 2) where i/(t) € K^ is a Gaussian 
white noise of magnitude 50% of the signal. The filtering algorithm based on a weighted 
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mean corresponding to the metric (jl3|) writes: 



Ui+l 



. .T0, + eY,dt. . .Te^ + OyM.. 

(cos( n , , ).sm( )), 



dt 



dt 



2 , T log rf + di logs? 

^'+i=^"P( dTTT )^ 



(15) 
(16) 



s'w,w,f 



where the current estimated matrix is rfuiuj with ui — (cos 0^, sin 0^), and Yi 

where Vi = (cos 0y. , sin ^y^ ) . The Grassman mean was computed using ([9|) . The result is 

represented on figure 21 









Figure 4: Filtering on 5*^(1, 2): plot of the 3 coefficients of the measured matrix (dashed 
line) and the filtered matrix (plain line) with a 50% measurement noise, and r = 50di. The 
filter allows to denoise the measured rank-1 symmetric matrix. 



As mentioned earlier, the geometric mean in the cone has proved useful in several filtering 
applications due to its remarkable robustness to outliers. Here, it is easy to see for instance 
in the S'+(l,2) case looking at update p^ . that an outlier having a very large amplitude 
will be crushed thanks to the logarithm function. Such a property is inherited by the metric 
p^ for filtering on S+(p,n). See Figure [5] for an experimental illustration of the property. 
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Figure 5: Above : same setting as Figure |4] but an outlier is inserted at each 10 steps. Below 
: zoom on the filtered signals (to be compared to Figure |4]). The filter shows good robustness 
properties to outliers. 
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